Topic Segmentation for Short Texts
نویسندگان
چکیده
Topic segmentation, which aims to fmd the boundaries between topic blocks in a text, is an important task for semantic analysis of texts. Although different solutions have been proposed for the task, many limitations and difficulties exist in the approaches. In particular most of the methods do not work well for such case as short texts, internet news and student's writings. In this paper, we focus on the short texts and present a method for topic segmentation. It can overcome the limitations in previous works. In preliminary experiments, the method show the accuracy of topic segmentation is increased effectively.
منابع مشابه
A Topic Segmentation of Texts based on Semantic Domains
1 LIMSI-CNRS. BP 133, 91403 Orsay Cedex, France. email: [ferret,grau]@limsi.fr Abstract. Thematic analysis is essential for many Natural Language Processing (NLP) applications, such as text summarization or information extraction. It is a two-dimensional process that has both to delimit the thematic segments of a text and to identify the topic of each of them. The system we present possesses th...
متن کاملOn the contribution of discourse structure to topic segmentation
In this paper, we describe novel methods for topic segmentation based on patterns of discourse organization. Using a corpus of news texts, our results show that it is possible to use discourse features (based on Rhetorical Structure Theory) for topic segmentation and that we outperform some well-known methods.
متن کاملTopic Modeling over Short Texts by Incorporating Word Embeddings
Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-o...
متن کاملИсследование специфики применения алгоритмов тематической сегментации для научных текстов (Specifics of Applying Topic Segmentation Algorithms to Scientific Texts)
متن کامل
Language Segmentation of Twitter Tweets using Weakly Supervised Language Model Induction
This paper presents early results of a weakly supervised language model induction approach for language segmentation of multilingual texts with a special focus on short texts.
متن کامل